345 research outputs found
Computing Covers Using Prefix Tables
An \emph{indeterminate string} on an alphabet is a
sequence of nonempty subsets of ; is said to be \emph{regular} if
every subset is of size one. A proper substring of regular is said to
be a \emph{cover} of iff for every , an occurrence of in
includes . The \emph{cover array} of is
an integer array such that is the longest cover of .
Fifteen years ago a complex, though nevertheless linear-time, algorithm was
proposed to compute the cover array of regular based on prior computation
of the border array of . In this paper we first describe a linear-time
algorithm to compute the cover array of regular string based on the prefix
table of . We then extend this result to indeterminate strings.Comment: 14 pages, 1 figur
Inferring an Indeterminate String from a Prefix Graph
An \itbf{indeterminate string} (or, more simply, just a \itbf{string}) \s{x}
= \s{x}[1..n] on an alphabet is a sequence of nonempty subsets of
. We say that \s{x}[i_1] and \s{x}[i_2] \itbf{match} (written
\s{x}[i_1] \match \s{x}[i_2]) if and only if \s{x}[i_1] \cap \s{x}[i_2] \ne
\emptyset. A \itbf{feasible array} is an array \s{y} = \s{y}[1..n] of
integers such that \s{y}[1] = n and for every , \s{y}[i] \in
0..n\- i\+ 1. A \itbf{prefix table} of a string \s{x} is an array \s{\pi} =
\s{\pi}[1..n] of integers such that, for every , \s{\pi}[i] = j
if and only if \s{x}[i..i\+ j\- 1] is the longest substring at position
of \s{x} that matches a prefix of \s{x}. It is known from \cite{CRSW13} that
every feasible array is a prefix table of some indetermintate string. A
\itbf{prefix graph} \mathcal{P} = \mathcal{P}_{\s{y}} is a labelled simple
graph whose structure is determined by a feasible array \s{y}. In this paper we
show, given a feasible array \s{y}, how to use \mathcal{P}_{\s{y}} to
construct a lexicographically least indeterminate string on a minimum alphabet
whose prefix table \s{\pi} = \s{y}.Comment: 13 pages, 1 figur
Deep network with score level fusion and inference-based transfer learning to recognize leaf blight and fruit rot diseases of eggplant
Eggplant is a popular vegetable crop. Eggplant yields can be affected by various diseases. Automatic detection and recognition of diseases is an important step toward improving crop yields. In this paper, we used a two-stream deep fusion architecture, employing CNN-SVM and CNN-Softmax pipelines, along with an inference model to infer the disease classes. A dataset of 2284 images was sourced from primary (using a consumer RGB camera) and secondary sources (the internet). The dataset contained images of nine eggplant diseases. Experimental results show that the proposed method achieved better accuracy and lower false-positive results compared to other deep learning methods (such as VGG16, Inception V3, VGG 19, MobileNet, NasNetMobile, and ResNet50)
String Comparison in -Order: New Lexicographic Properties & On-line Applications
-order is a global order on strings related to Unique Maximal
Factorization Families (UMFFs), which are themselves generalizations of Lyndon
words. -order has recently been proposed as an alternative to
lexicographical order in the computation of suffix arrays and in the
suffix-sorting induced by the Burrows-Wheeler transform. Efficient -ordering
of strings thus becomes a matter of considerable interest. In this paper we
present new and surprising results on -order in strings, then go on to
explore the algorithmic consequences
Algorithms to Compute the Lyndon Array
We first describe three algorithms for computing the Lyndon array that have
been suggested in the literature, but for which no structured exposition has
been given. Two of these algorithms execute in quadratic time in the worst
case, the third achieves linear time, but at the expense of prior computation
of both the suffix array and the inverse suffix array of x. We then go on to
describe two variants of a new algorithm that avoids prior computation of
global data structures and executes in worst-case n log n time. Experimental
evidence suggests that all but one of these five algorithms require only linear
execution time in practice, with the two new algorithms faster by a small
factor. We conjecture that there exists a fast and worst-case linear-time
algorithm to compute the Lyndon array that is also elementary (making no use of
global data structures such as the suffix array)
Automatic log parser to support forensic analysis
Event log parsing is a process to split and label each field in a log entry. Existing approaches commonly use regular expressions or parsing rules to extract the fields. However, such techniques are time-consuming as a forensic investigator needs to define a new rule for each log file type. In this paper, we present a tool, namely nerlogparser, to parse the log entries automatically, where log parsing is modeled as a named entity recognition problem. We use a deep machine learning technique, specifically the bidirectional long short-term memory networks, as the underlying architecture for this purpose. Unlike existing tools, nerlogparser is a fully automatic tool as the investigators do not need to define any parsing rules and it is generic as there is only one model to parse various types of log files. Experimental results show that nerlogparser achieves superior performance compared with other traditional machine learning methods
Graph clustering and anomaly detection of access control log for forensic purposes
Attacks on operating system access control have become a significant and increasingly common problem. This type of security threat is recorded in a forensic artifact such as an authentication log. Forensic investigators will generally examine the log to analyze such incidents. An anomaly is highly correlated to an attacker's attempts to compromise the system. In this paper, we propose a novel method to automatically detect an anomaly in the access control log of an operating system. The logs will be first preprocessed and then clustered using an improved MajorClust algorithm to get a better cluster. This technique provides parameter-free clustering so that it automatically can produce an analysis report for the forensic investigators. The clustering results will be checked for anomalies based on a score that considers some factors such as the total members in a cluster, the frequency of the events in the log file, and the inter-arrival time of a specific activity. We also provide a graph-based visualization of logs to assist the investigators with easy analysis. Experimental results compiled on an open dataset of a Linux authentication log show that the proposed method achieved the accuracy of 83.14% in the authentication log dataset
Frame-Wise dynamic threshold based polyphonic acoustic event detection
Acoustic event detection, the determination of the acoustic event type and the localisation of the event, has been widely applied in many real-world applications. Many works adopt multi-label classification techniques to perform the polyphonic acoustic event detection with a global threshold to detect the active acoustic events. However, the global threshold has to be set manually and is highly dependent on the database being tested. To deal with this, we replaced the fixed threshold method with a frame-wise dynamic threshold approach in this paper. Two novel approaches, namely contour and regressor based dynamic threshold approaches are proposed in this work. Experimental results on the popular TUT Acoustic Scenes 2016 database of polyphonic events demonstrated the superior performance of the proposed approaches
Consumer perceptions in the adoption of the electronic health records in Australia: A pilot study
The paper reports an empirical investigation of the factors affecting consumer perceptions of the adoption of Electronic Health Records in Australia. This paper also details the processes involved in the pilot testing of the instrument where it has been pilot-tested to a convenience sample by sending individual postal survey envelopes to shortlisted community organisations in Australia.
Reliability analysis to check the internal consistency was performed using the Cronbach’s alpha. Content validity was achieved by reviewing the instrument with a panel of experts. The results of this pilot study proved the feasibility of a full-scale study and these could be used as the basis for refinement of the instrument. Based upon the outcome of validity and reliability testing, items for the final instrument were identified. The findings showed that the tested model does fit the data well and has a significant and positive impact on the consumer’s attitude in using the EHR
- …